Skip to content

Comments

Add opt-in PII redaction with typed tokens#397

Open
peyton-alt wants to merge 6 commits intomainfrom
feature/pii-redaction
Open

Add opt-in PII redaction with typed tokens#397
peyton-alt wants to merge 6 commits intomainfrom
feature/pii-redaction

Conversation

@peyton-alt
Copy link
Contributor

Closes #363

Adds opt-in PII redaction to the redact/ package. When enabled, emails, phone numbers, and custom patterns are replaced with typed tokens ([REDACTED_EMAIL], [REDACTED_PHONE], etc.) stored in metadata, transcripts, prompts, summaries, and context files on both shadow and metadata branches.

Secret redaction (API keys, tokens) remains always-on and unchanged.

Thanks to @ishaan812 for filing the issue and proposing the approach.

Add to .entire/settings.json:

{
  "redaction": {
    "pii": {
      "enabled": true,
      "email": true,
      "phone": true,
      "address": false,
      "custom_patterns": {
        "employee_id": "EMP-\\d{6}"
      }
    }
  }
}
  • email and phone default to true when PII is enabled
  • address defaults to false (higher false-positive rate)
  • custom_patterns accepts arbitrary regex with a label that becomes [REDACTED_]

Copilot AI review requested due to automatic review settings February 18, 2026 00:24
@cursor
Copy link

cursor bot commented Feb 18, 2026

PR Summary

Medium Risk
Touches core redaction logic and changes what gets written into checkpoint metadata, which could affect data fidelity or produce false positives/negatives despite being opt-in and well-tested.

Overview
Adds opt-in PII redaction to the redact pipeline so emails/phones/addresses and custom regex patterns can be replaced with typed placeholders like [REDACTED_EMAIL] while keeping existing secret redaction always-on.

Introduces a new .entire/settings.json redaction.pii config (with field-level merging for settings.local.json) and ensures it’s initialized once at key entry points (hooks git startup and doctor). Metadata directory files written into checkpoint git trees now use the redacting blob writer so transcripts/prompts/etc. are sanitized before being committed.

Written by Cursor Bugbot for commit 242ff5c. Configure here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds opt-in PII (Personally Identifiable Information) redaction to the Entire CLI, extending the existing secret redaction capability. When enabled via settings, emails, phone numbers, addresses, and custom patterns are replaced with typed tokens (e.g., [REDACTED_EMAIL], [REDACTED_PHONE]) in metadata files (transcripts, prompts, summaries) before they're stored in git. Secret redaction (API keys, tokens) remains always-on. The implementation adds a new redact/pii.go module with configurable PII detection, integrates it with the existing redaction flow, and adds comprehensive settings support with local override capability.

Changes:

  • Added opt-in PII detection with regex patterns for email, phone, and US street addresses, plus support for custom patterns
  • Extended redaction to use typed tokens ([REDACTED_<TYPE>]) for PII while maintaining backward-compatible REDACTED tokens for secrets
  • Integrated PII configuration loading from .entire/settings.json with proper defaults (email/phone default to true, address defaults to false)

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
redact/redact.go Modified region merging to support labeled redaction tokens; added PII detection call
redact/pii.go New file implementing PII detection with configurable categories and custom patterns
redact/pii_test.go Comprehensive tests for PII detection, category toggles, and coexistence with secrets
cmd/entire/cli/settings/settings.go Added RedactionSettings and PIISettings structs to support configuration
cmd/entire/cli/settings/settings_test.go Added tests for redaction settings loading and local overrides
cmd/entire/cli/strategy/common.go Added EnsureRedactionConfigured() to load PII settings before checkpoint writes
cmd/entire/cli/strategy/manual_commit_hooks.go Added EnsureRedactionConfigured() call in PostCommit hook
cmd/entire/cli/strategy/manual_commit_git.go Added EnsureRedactionConfigured() call in SaveChanges
cmd/entire/cli/strategy/auto_commit.go Added EnsureRedactionConfigured() call in SaveChanges
cmd/entire/cli/checkpoint/temporary.go Changed metadata file blob creation to use createRedactedBlobFromFile (code files unchanged)

@peyton-alt peyton-alt force-pushed the feature/pii-redaction branch from ac2552e to 7c485e8 Compare February 18, 2026 00:53
@peyton-alt peyton-alt marked this pull request as ready for review February 18, 2026 01:51
@peyton-alt peyton-alt requested a review from a team as a code owner February 18, 2026 01:51
@peyton-alt peyton-alt force-pushed the feature/pii-redaction branch 3 times, most recently from 177fcb3 to 848f6cc Compare February 18, 2026 06:24
@peyton-alt peyton-alt force-pushed the feature/pii-redaction branch from 887e089 to 242ff5c Compare February 24, 2026 22:16
@peyton-alt
Copy link
Contributor Author

@cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

for _, r := range merged {
b.WriteString(s[prev:r.start])
b.WriteString("REDACTED")
b.WriteString(replacementToken(r.label))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nondeterministic token for identical match spans

Medium Severity

String merges overlapping taggedRegions by keeping the first region’s label, but the sort tie-breaker only considers start and end. When two detectors produce regions with identical [start,end] (e.g., builtin email and a custom pattern matching the same email), sort.Slice ordering is not deterministic, so the emitted token label can vary between runs.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Scrub sensitive data (PII & secrets) from LLM context and checkpoints

1 participant